AltecOnDB: A Large-Vocabulary Arabic Online Handwriting Recognition Database

نویسندگان

  • Ibrahim Abdelaziz
  • Sherif Abdou
چکیده

Arabic is a semitic language characterized by a complex and rich morphology. The exceptional degree of ambiguity in the writing system, the rich morphology, and the highly complex word formation process of roots and patterns all contribute to making computational approaches to Arabic very challenging. As a result, a practical handwriting recognition system should support large vocabulary to provide a high coverage and use the context information for disambiguation. Several research efforts have been devoted for building online Arabic handwriting recognition systems. Most of these methods are either using their small private test data sets or a standard database with limited lexicon and coverage. A large scale handwriting database is an essential resource that can advance the research of online handwriting recognition. Currently, there is no online Arabic handwriting database with large lexicon, high coverage, large number of writers and training/testing data. In this paper, we introduce AltecOnDB, a large scale online Arabic handwriting database. AltecOnDB has 98% coverage of all the possible PAWS of the Arabic language. The collected samples are complete sentences that include digits and punctuation marks. The collected data is available on sentence, word and character levels, hence, high-level linguistic models can be used for performance improvements. Data is collected from more than 1000 writers with different backgrounds, genders and ages. Annotation and verification tools are developed to facilitate the annotation and verification phases. We built an elementary recognition system to test our database and show the existing difficulties when handling a large vocabulary and dealing with large amounts of styles variations in the collected data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tool to Develop Arabic Handwriting Recognition System Using Genetic Approach

Problem statement: Significant movement has been made in handwriting recognition technology over the last few years. Up until now, Arabic handwriting recognition systems have been limited to small and medium vocabulary applications, since most of them often rely on a database during the recognition process. The facility of dealing with large database, however, opens up many more applications. A...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

AlexU-Word: A New Dataset for Isolated-Word Closed-Vocabulary Offline Arabic Handwriting Recognition

In this paper, we introduce a new dataset for offline Arabic handwriting recognition. The aim is to collect a large dataset of isolated Arabic words that covers all letters of the alphabet in all possible shapes using a small number of simple words. The end goal is to obtain a very large database of segmented letter images, which can be used to build and evaluate Arabic handwriting recognition ...

متن کامل

Large Vocabulary Arabic Online Handwriting Recognition System

Online handwriting recognition of Arabic script is a difficult problem since it is naturally both cursive and unconstrained. The analysis of Arabic script is further complicated due to obligatory dots/stokes that are placed above or below most letters and usually written delayed in order. In addition, Arabic language is rich in morphology and syntax which makes it a must for a good online handw...

متن کامل

RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

We present a novel large vocabulary OCR system, which implements a 5 confidenceand margin-based discriminative training approach for model adap6 tation of an HMM based recognition system to handle multiple fonts, different 7 handwriting styles, and their variations. Most current HMM approaches are HTK 8 based systems which are maximum-likelihood (ML) trained and which try to adapt 9 their model...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1412.7626  شماره 

صفحات  -

تاریخ انتشار 2014